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There is a surge in medical follow-up studies that include longi- 
tudinal covariates in the modeling of survival data. So far, the focus 
has been largely on right-censored survival data. We consider sur- 
vival data that are subject to both left truncation and right censor- 
ing. Left truncation is well known to produce biased sample. The 
sampling bias issue has been resolved in the literature for the case 
which involves baseline or time-varying covariates that are observ- 
able. The problem remains open, however, for the important case 
where longitudinal covariates are present in survival models. A joint 
likelihood approach has been shown in the literature to provide an ef- 
fective way to overcome those difficulties for right-censored data, but 
this approach faces substantial additional challenges in the presence 
of left truncation. Here we thus propose an alternative likelihood to 
overcome these difficulties and show that the regression coefficient in 
the survival component can be estimated unbiasedly and efficiently. 
Issues about the bias for the longitudinal component are discussed. 
The new approach is illustrated numerically through simulations and 
data from a multi-center AIDS cohort study. 

1. Introduction. Since the seminal paper by Wulfsohn and Tsiatis (1997), 
longitudinal covariates have played an increasingly important role in the 
modeling of survival data. One major challenge to incorporate longitudinal 
covariates is that simple approaches, such as the partial likelihood method 
for the Cox proportional hazards model [Cox (1972)], often require knowl- 
edge of the entire longitudinal process. This is often not feasible in reality 
for follow-up checks at discrete and intermittent time points. A common 
practice is to impute the values of the missing longitudinal processes and 
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then apply the partial likelihood approach to the imputed data. This is 
called a two-stage approach, where the longitudinal process is imputed at 
the first stage before the partial likelihood approach is employed to estimate 
parameters in the survival model at the second stage. The most common 
imputation method is to use the last and most recent value of the patient 
to impute a missing value, the so-called last-value-carry-forward method, 
which has been adopted in standard software such as SAS and R. Additional 
two-stage procedures were developed by Tsiatis, DeGruttola and Wulfsohn 
(1995) and Dafni and Tsiatis (1998). 

It is easy to foresee serious biases with such an imputation method if the 
follow-up schedule is infrequent over time and also when the longitudinal 
covariates are contaminated by noises or measurement errors. Both scenar- 
ios provide strong motivation to find alternative approaches. The approach 
developed by Wulfsohn and Tsiatis (1997) to model the survival and longi- 
tudinal data simultaneously through their joint likelihood is attractive on 
two counts: (i) the resulting parametric estimators are semiparametrically 
efficient when the baseline hazard function is unknown, and (ii) the joint 
likelihood procedure is often insensitive to the normality assumption on the 
longitudinal data, if there is a reasonable number of repeated measurements 
available for the longitudinal processes; see Zeng and Cai (2005) and Dupuy, 
Grama and Mesbah (2006) for (i) and Song, Davidian and Tsiatis (2002), 
Tsiatis and Davidian (2004) and Hsieh, Tseng and Wang (2006) for (ii). 

The above joint likelihood approach not only successfully removes the bi- 
ases on the survival component but also leads to efficient estimation. A his- 
torical example for the joint likelihood approach is the investigation of CD4 
T-cell counts as a biomarker of time-to-death or time-to- AIDS [DeGruttola 
and Tu (1994), Wulfsohn and Tsiatis (1997), Henderson, Diggle and Dob- 
son (2000)]. In these and other works, the survival time is subject to the 
usual right censoring. However, left truncation is common for studies with 
delayed entry. Specifically, if the recruitment of patients continues after the 
onset time of a study, those that have already experienced the event are 
often excluded from the study, which then results in left truncation of the 
event-time. Patients who remain in the study are further subject to the usual 
right censoring, so the sample consists of left-truncated and right-censored 
(LTRC) survival times. It is well known that left truncation is a biased sam- 
pling plan as subjects with shorter survival times tend to be excluded from 
the sample. As a result, the longitudinal measurements are also sampled 
with bias. 

An example of left-truncated and right-censored longitudinal study is the 
Italian multi-center HIV (human immunodeficiency virus) study [Rezza et al. 
(1989), The-Italian-Seroconversion-Study (1992)], where the primary end- 
point is the time from HIV positive to AIDS onset, that is, the incubation 
period of AIDS. In this study, patients who have developed AIDS at the time 
of recruitment were excluded from the study, resulting in left truncation of 
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the survival data, and CD4 counts for those who were HIV positive but 
ADIS free were measured at each follow-up visit. As there are no procedures 
available to handle such data properly, we develop in this paper a semipara- 
metric joint likelihood approach to accommodate LTRC survival data with 
longitudinal covariates that are measured intermittently. 

Although there is a sizable literature to jointly model right-censored sur- 
vival and longitudinal data [see Wulfsohn and Tsiatis (1997), Henderson, 
Diggle and Dobson (2000), Song, Davidian and Tsiatis (2002) and the review 
papers by Tsiatis and Davidian (2004)], the extension to LTRC survival data 
turns out nontrivial due to the left-truncation feature of the data. To see this, 
consider first the simpler case of left-truncated data with time-independent 
covariates or no covariates at all. Lynden-Bell (1971), Woodroofe (1985) and 
Wang (1987) investigated estimation of the survival function when subjects 
come from the same population, that is, there are no covariates involved. 
Here, one only needs to adjust the risk set for truncated data to reach 
a suitable extension of the Kaplan-Meier estimator. For time- independent 
covariates Andersen et al. (1993) considered estimation under the Cox model 
and showed that the partial likelihood approach for right-censored data still 
works for LTRC survival data when one conditions on the values of the 
covariates and truncation times. 

For time-dependent covariate, the Andersen et. al. (1993) partial likeli- 
hood approach can still be employed if the entire covariate history is avail- 
able for all subjects. This is not the case for longitudinal covariates that 
are observed intermittently at discrete time points. Since imputation meth- 
ods lead to biases of the estimates, bias corrected approaches have been 
employed in the literature for right-censored data with longitudinal covari- 
ates. In particular, Wang (2006) proposed a method to correct the bias 
through the partial score equation. Such an approach is termed "corrected 
score" methods, which originates from studies of measurement errors. While 
corrected score methods typically lead to -y/n-consistent estimators for the 
regression parameters in the Cox model, they are not efficient and easy to 
derive. Extension of the corrected score methods to LTRC (left-truncated 
and right-censored) data might be feasible but have not been explored. In 
this paper, we adopt the full and joint likelihood approach of the survival 
and longitudinal data due to its aforementioned efficiency and robustness 
features. Unfortunately, direct maximization of the full joint likelihood is 
much more complicated than the cases with no left truncation. We discov- 
ered a modified likelihood that is simpler, yet retains the efficiency of the 
full likelihood approach, as described in Section 2. 

The rest of the paper is organized as follows. In Section 2, we introduce 
a joint model setting for both the survival time and longitudinal processes 
and propose a modified likelihood approach for statistical inference. An EM 
algorithm to maximize the modified likelihood is derived in Section 3, along 
with the large sample properties of the nonparametric maximum modified 
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likelihood estimator (NPMMLE), including consistency, asymptotic normal- 
ity and efficiency. Numerical performance of the proposed estimating pro- 
cedure is validated through simulation studies in Section 4 and illustrated 
through the Italian HIV study in Section 5. Section 6 contains some discus- 
sion. 

2. Joint modeling under LTRC. We consider the setting that the sur- 
vival time Y* of a subject is subject to random left truncation by T*, so 
a subject is enrolled in a study only if Y* > T* . Let n be the total number 
of subjects enrolled in the study. With such a biased sampling plan, to avoid 
confusion of notation, we denote the survival and truncation time of the ith 
enrolled subjects as (Yi,Ti), which are sampled from the joint subpopulation 
of (Y*,T*), where Y* > T*. Upon entering the study, these n subjects are 
subject to the usual right censorship, so the final observed survival data for 
the ith subject is a triplet (Tj, Zj, Aj), where Zj = min(l^, d) is the time of 
the endpoint event or drop-out (censoring) time Ci, whichever occurs first, 
and Aj = I{Yi < Ci) is the censoring indicator. 

In reality, drop-out or censoring only occurs when a subject is enrolled into 
the study. This fact implies that the right-censoring time Ci is greater than 
the truncation time Tj, for i = 1, . . . , n. Therefore, we introduce a positive 
random variable Ui to represent the time from entry into the study to drop- 
out from the study, that is, Ui = Ci — T. L . 

In addition to the survival data, baseline and longitudinal covariates are 
collected intermittently for the zth subject from the time the subject en- 
ters the study until the observational limit Zj. This results in rii repeated 
measurements, denoted by Wi = (Wu, W$2, • • • , Wi ni ), where the measure- 
ments are taken at time points Sj = (sn,Si2,. ■ ■ ,Si ni ). It is important to 
make a note here that the observed Wi are also subject to the same biased 
sampling plan as the survival data, so there is a background longitudinal 
vector, which we will denote as W* for the ith subject enrolled in the study. 
Therefore, Wi is sampled from the subpopulation of W*, where Y* > T*, 
and values beyond Zj are not observed. For simplicity of notation, we assume 
in this section that there is only one longitudinal covariates, but additional 
longitudinal or baseline covariates can be handled easily and the AIDS data 
discussed in Section 5 contain two longitudinal covariates, one observed in- 
termittently but the complete history of the other one, the time-dependent 
treatment indicator, is available. 

2.1. The joint models. Since repeated measurements from the same sub- 
jects are likely to be correlated, we introduce a latent qxl random vector A* 
to account for their dependency and assume a common parametric density 
function f\{-\a) with an unknown parameter a for A*. A linear mixed effects 
model will be considered for the longitudinal covariate 

(2.1) W*=X(si)+e l = g(si)A*+e i , 
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where <?(•) is a known g-dimensional function and the 7ij X 1 vector S\ plays 
the role of measurement errors, sampled from a multivariate normal distri- 
bution with independent marginal distribution AA(0,cr 2 ), and independent 
of all other aforementioned random variables. 

For the survival time Y*, a proportional hazards model is employed, and 
the hazard rate of Y* at time t given A* is 



where Ao is the baseline hazard rate and /3 is the regression coefficient. The 
truncation time T* and the time U, from entry to drop-out, are assumed 
to have distribution function Ft*(-) and Fjj(-), respectively. We adopt the 
standard assumption in survival analysis, that Y*, T* and Ui are condi- 
tionally independent given the covariates. This is equivalent to assuming 
conditional independence of Y*, T* and U given the value of A*. We also 
assume that T* and U are independent of A*, and the parameters in the 
models for either the survival or longitudinal parts are noninformative. 

2.2. A modified likelihood approach. For the model described in the pre- 
vious subsection, the parameters of interest are (/3, a, a 2 and Ao(-)), where 
the first three components are in the Euclidean space whereas Ao(i) = 
J" Xo(u)du, the cumulative hazard function, is in a functional space, hence 
the model is semiparametric. Since a likelihood approach usually provides 
the most efficient estimating procedure, we first consider the full likelihood 
function Lf based on the observations (tj, Z{, 5i, Wi) from the ith subject. 
The derivation of the full likelihood from the ith subject is shown below: 



(2.2) 



\ Y *(t\A*) = \ (t)eMPXi(t)) 



Li = f(T,YA,w)(U,Zi,5i,Wi) 



f(T*,Y*,A*,W*)(ti> z i^ii w i) 

P(Y* > T*) 




x f w *( Wl \A* = ai)f A *( ai )dai \f T *(ti)/V(Y* > T*) 



(2.3) 






xfT*(ti\Y*>T*) 
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where fy is the density function of the random variable V in the subscript, 
and Sy is the corresponding survival function. In (2.3), besides the baseline 
hazard function Ao, the density function fx* also serves as a nonpar ametric 
component. Because of these two nonparametric components, the full like- 
lihood function is unbounded, so we resort to the nonparametric maximum 
likelihood approach, which leads to a similar scenario as in conventional 
survival analysis that the full likelihood is the same as the conditional likeli- 
hood given the left-truncation time. This has been explored in the literature 
[Andersen et al. (1993), Klein and Moeschberger (2003)] for LTRC data with 
baseline covariates, and was first explored in Wang (1987) for the simpler 
situation of left-truncated data that came from a single population. Follow- 
ing a similar argument as in Wang (1987), we found that the full likelihood 
can be simplified to the following conditional likelihood for the ith subject as 



Next, we consider the nonparametric maximum likelihood estimators 
(NPMLE) of the survival component, which, by a similar argument for joint 
modeling right-censored data and their longitudinal covariates [Zeng and Cai 
(2005), Dupuy, Grama and Mesbah (2006)], leads to a piecewise linear base- 
line cumulative hazard function with jumps at each uncensored event time 
(i.e., at Yi, whenever Aj = 1). Let n u denote the total number of uncensored 
events, the baseline cumulative hazard function is thus re-parameterized as 
a n u -dimensional vector. 

So far, the derivation of the likelihood function and NPMLE follows a sim- 
ilar path as the much investigated case of a joint modeling setting with right- 
censored data, where NPMLE's for the parametric component enjoy nice 
asymptotic properties and are semiparametrically efficient. Despite these 
similarities, the left truncation feature triggers complications in the estima- 
tion of the finite dimensional parameter in the joint LTRC model. First, as 
shown in the Appendix, the parameter a associated with the latent vari- 
able A* is not identifiable. This is a consequence of the biased sampling 
plan, since the samples are actually drawn from the subpopulation Y* >T*. 
Consequently, only E(A*\Y* > T*) and var(A*\Y* > T*) could be identi- 
fied under the normality assumption. Thus, while it is possible to identify 
the unknown parameters of Y* and T* based on the joint conditional dis- 
tribution of (Y*,T*)\Y* > T*, where the notation (-\Y* > T*) stands for 
a random variable/ vector sampled from the subpopulation with Y* > T* , 
there is not enough information to recover E(A*) and var(A*) and hence 
the true longitudinal parameters a. 

A second complication is that the score equations for the survival com- 
ponents, P and Aq, are much more complicated than the situation under 



(2.4) 



L? = / [fy* (zi\Y* > U, A* = ai )] 5i [S Y * (z l \Y* > t h A* = a,)] 




l-Si 
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a right-censored only model and, as shown in Appendix A.l, as they require 
estimation of the expectations of nonlinear functions of the observed data 
along with the the parameters of interest. This motivates us to modify the 
likelihood so as to simplify the estimation of all parameters that are identi- 
fiable. Our proposal is to aim at the following modified likelihood, denoted 
by L m , as an alternative of the full, also the conditional, likelihood in (2.4). 
The modified likelihood is 

L m = fl{ [[f Y *(z t \Y*>t u A* = a t )] s > 
i=i ^ 

(2.5) x [S Y ^z l \Y*>t u A* = a i )] 1 ~ s ' 

x f w * ( Wl | A* = a t ) f A . (a, ) dai ) fx* (<* \Y* >T*), 



where the lower case variables denote the values of the corresponding upper 
case variables, for example, 5i is the value of Aj. The estimators obtained 
by maximizing the modified likelihood, where the nonpar ametric cumula- 
tive hard function is replaced by a step function will be referred to as the 
nonparametric maximum modified likelihood (NPMMLE) hereafter. 

The difference between (2.4) and (2.5) is that fA*(cii\Y* > tj) in the full 
likelihood (2.4) is replaced by in (2.5). This is motivated by the 

fact that f A *(a\Y* >t) = Sy f^ a) Sa* (a) and E[ %'^p ] = 1, for any t, 
and that, as shown in Lemma A.l in the Appendix, the score functions of 
the survival parameters from (2.5) are asymptotically the same as those 
from (2.4). Theoretical results in the next section and numerical evidence 
in Section 4 demonstrate good performance of estimators of all the survival 
parameters, (/3, Ao(-)) and of the measurement errors a 2 of the longitudinal 
component that we derived from this modified likelihood. 

3. EM-algorithm and asymptotic properties. Let 7 = (f3,a,a 2 ) be the 

finite dimensional parameter in the joint survival and longitudinal model, 
and A be a step function. The log modified likelihood is 



n „ 

r( 7 ,A) = ^ln J [A^lexp^H]^ 



xexpj- Myj}exp{/?£(y°)ai} 

j-U<y°<zi 

X (2TTO- 2 )- n ^ 2 

{rrii \ 
- y^[wjj - g(sij)ai] 2 /(2a 2 ) \f A *{ai)dai, 
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where A{-} is the jump size of A at the respective time point in the argu- 
ment, and yj is the jth sorted observed survival time in increasing order. 
Moreover, t\ and T2 denote the lower bound of truncation time and the 
largest censoring time corresponding to the end of the study. 

Since direct maximizing the proposed modified likelihood involves inte- 
gration of a complex function with respect to the random effects, we employ 
the expectation-maximization (EM) algorithm [Laird and Ware (1982)] to 
stabilize the maximization procedure. In the implementation of the EM al- 
gorithm, a Monte Carlo integration approach is used to approximate the 
expectation terms of functions h(A*) appearing in the E-step. A one-step 
Newton-Raphson method is applied to solve the nonlinear equations in the 
M-step. The posterior density of the random effects A* given the observed 
data from the ith subject, Oj = (ij, Zi,5i,Wi), is of the form 

, , , N f(Y,A)\(A,T)(zi,Si\a,ti) X fA*\W*(a\Wi) 

JA*\0\ a \ i) ~ 



f f(Y,A)\(A,T)(zi,Si\a,ti) x f A *\ w *(a\wi)da 



= [A{^exp|- A{yJ}exp{/3 5 (y5)a}J x f A . ]w .(a 

j-U<y°<zi 

/|[A{*}]*exp{- A{yO}exp{/My>}} 

j-ti<y°<Zi 

x fA*\w*(a\wi)da. 

For a simpler implementation of the algorithm, we shall impose a normal 
assumption on the random effects and assume that A*, i = 1, . . . ,n, follow 
a normal distribution N(fi,T,), where E) plays the role of the parame- 
ter a. 

By taking the first derivative of the log modified likelihood calculated 
in the E-step with respect to each parameter, the NPMMLE, j3, {Xk,k = 
1, ...,n u }, a, (x and £, can be obtained through the following formulas, 
where is the jump size of A at the /cth sorted observed survival time: 

Xk = E l , l<y o< Zl HeMmy k )At}\o l V k = 1 >---^ 

^ n m 

= J2J2 E i(wij - g(sij)A*) 2 \oi], 

2^=i n i i=1 j=1 

1 n 1 n 

£ = - £ E(A*\ 0i ), S = -£ E[{A* - fi)(A* - (i) T \ 0l \ 

i=l i=l 
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and (3 is the root of the score s(/3), which is solved by an one-step Newton- 
Raphson method with the updating rule 

s(A,ld) 



/^new — Pold 



where 

n 



g{z % )E{A*\ 0i ) 



s'(/3oid) ' 

>:,:/ ^ , , E{g( Zl )A* exp{(3g( Zi )A*}\ 0j : 



T.r.t ] <z l <z ] E{^v{^)A*}\o 3 ) 
E J:fj <^ E{g{ Zl )A* e W {P9(zi)A*}\o 3 ) 

E j :t ] < Zl < Zj E((g(z l )A*) 2 exp{Pg(z l )A*,}\o. 
Ylj:t j <z i < Zj E (ex.-p{pg(zi)A^}\oj) 

Except for a, the proposed nonparametric maximum modified likelihood 
estimates (NPMMLE) of the parameters enjoy nice properties that are sim- 
ilar to the NPMLE, as illustrated in the next two theorems. Below, we list 
some regularity conditions needed for the theorems: 

(CI) The parameter space of the finite dimensional parameters, S' 7 , is bound- 
ed and closed on Euclidean space. The true value 70 is an interior point 

Of Sy. 

(C2) On the parameter space of /3, (exp{/3g(S)A*}\Y* > T*) is bounded 

below by m and above by M with probability 1. 
(C3) P(T < n and Y > t%) > 0. This ensures that not all data are truncated 

or censored. 

(C4) E do {exp[f3 g(u)A*}I(T* <u< Y*)\Y* > T*} is bounded away from 
on the parameter space of /?. Here Eg (-) stands for the expectation 
taken under the true value of the parameter 9q. 

(C5) g(t) is of uniformly bounded variation on [t\ , T2] , and there exists a con- 
stant D such that P(rii < D) = l,Vi. 

(C6) The distribution fA*(-\a) is continuous with respect to a and has con- 
tinuous second derivative with respect to a. Moreover, the Fisher in- 
formation matrix obtained from Ja* for a is positive definite. 

Theorem 1 (Consistency of the estimators). Under the regularity con- 
ditions C1-C5, the NPMMLE of ((3q,o-q, Ao) , denoted as ((3 n ,a 2 ,A n ), is con- 
sistent under the Euclidean norm \ ■ \ and supremum norm || • ||oo on [t±,T2\, 
respectively. 

For H = {h = (/ii, /i2 5 ^3)} an d < p < 00, let H p = {h £ H : \\hi\\, \h,2\, 
II ^3 1| v < p}> be a collection of directions that are used in the Appendix. The 
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notation || • ||„ denotes the the total variation of the function in the norm 
plus the absolute value of this function evaluated at 0. The next theorem 
shows that the NPMMLE converges in distribution to a Gaussian element 
in the parameter space at a y'n-rate. 

Theorem 2 (Asymptotic normality and efficiency). Under the regular- 
ity conditions C1-C6, the process ^fn{a n — E(a n ), a\ — Uq, (3 n — (3q, A n — Ao) 
converges in distribution to a mean zero Gaussian process G in the functional 
space loo(Hp) on H p . Moreover, the NPMMLE (3 is semiparametrically effi- 
cient for (3q. 

Proofs of these two theorems are provided in the Appendix. 

For estimating the standard errors of the NPMMLE, we recommend to 
use the bootstrap procedure instead of the profile likelihood approach in 
Murphy and van der Vaart (2000) and Zeng and Cai (2005), which did not 
work well for LTRC data due to the high fluctuation of the estimated profile 
likelihood function and possibly negative estimate of the standard error. 
The performance of the bootstrap procedure for estimating the standard 
errors of the NPMLE under joint modeling with right-censoring cases has 
been studied by Tseng, Hsieh and Wang (2005) for the accelerated failure 
time model, and by Hsieh, Tseng and Wang (2006) for the Cox model. 
The results in these two papers and support the validity of the bootstrap 
method in the scope of joint modeling. Our simulation results reported in 
Section 4 also supports the use of the bootstrap approach. In comparison, 
the bootstrap method is more reliable than the profile likelihood method at 
a higher computational cost. 

4. Simulation study. To verify numerically the validity of the proposed 
procedure, we conducted simulations under five different settings. Since there 
is an intrinsic bias on the longitudinal component, the simulations focus on 
the performance of the estimate of f3 and how it would be affected by the 
level of contamination from the measurement errors and the variation of 
the random effects. As a benchmark setting, we considered a linear trend in 
time with random effects on the longitudinal covariate and assess the influ- 
ence of the variance of the random slope on the accuracy of estimating /?. 
The left-truncation times are generated from an exponential distribution 
with parameter 1, while the right-censoring times are from an exponential 
distribution with parameter 3. The baseline hazard rate is from an expo- 
nential distribution with mean 1. All 5 simulation settings have sample size 
n = 200 with true values = 1, (i = (2,0.5) and (a u ,a 12 ) = (0.5,-0.001). 
The values of (0-22, o~ 2 ) are different for the five settings and set as: (0.01, 0.1), 
(0.01,0.4), (0.01,0.025), (0.0025,0.1) and (0.04,0.1). The first three settings 
demonstrate the impact of contaminations by measurement errors while the 
last two illustrate the effect of the variation of the random slope. 
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Simulation results based on 100 Monte Carlo samples are reported in Ta- 
ble 1. Results under the first three settings suggest that f3 can be estimated 
unbiasedly, and measurement errors affect the precision, but not the magni- 
tude of the biases. As expected, higher level of noise contamination leads to 
less precise estimate of j3 and higher chance of divergence in the algorithm. 
In all three settings, the variance of measurement errors can be estimated 
with high accuracy and precision. Comparing with the results under the 
first, fourth and fifth setting from Table 1, we observe that the variance of 
the random slopes has little effect on the performance of (3. 

The results for the longitudinal part echo the above discussion of the non- 
identifiability of the parameter a, as the means of the random intercept and 
random slopes (shown in the second column of Table 1) are consistently un- 
derestimated. The actual targets of the estimates are the conditional quanti- 
ties marked as \x\ and \i\ etc. in the first column of Table 1. The sizes of the 
biases vary with the level of truncation probability and size of measurement 
errors and can be very small for the mean of the random slope, for example, 
in setting 3, where the measurement error is small. Thus, this bias problem 
in estimating the longitudinal component may elude researchers, while it is 
a cause of substantial concern in settings with large error variances. 

To make statistical inference about the parameters of interest, it is nec- 
essary to get an estimate of the standard error of the NPMMLE, especially 
for f3. We tried the approach in Murphy and van der Vaart (2000) and Louis 
(1982), but neither works, so we propose to use a bootstrap method [Tseng, 
Hsieh and Wang (2005)] for estimating the standard error of the NPMMLE 
and present the results in Table 2. Only the results for estimating the stan- 
dard errors of /3 and o 1 are shown, since they are estimable. Table 2 supports 
the use of the bootstrap procedure, as the estimated standard error from 
the bootstrap method is close to the standard deviation from the 100 Monte 
Carlo samples, even when the degree of error contamination is large or the 
random slopes vary widely. 

5. Data example: Multi-center HIV study. In this section, we conduct 
an analysis on the data from a multi-center HIV study in Italy. Details of 
the study design and a previous analysis can be found in Rezza et al. (1989) 
and The-Italian-Seroconversion-Study (1992). There were 448 HIV-positive 
patients in the data. The primary event of interest is the incubation period 
of acquired immunodeficiency syndrome (AIDS), that is, time (in years) 
from detection of HIV- infection until the onset of AIDS. There were 140 
patients who received the HA ART treatment at various times, resulting in 
a longitudinal treatment indicator that is fully observable, so no modeling of 
this process is necessary. However, there is a second longitudinal covariate, 
the CD4 counts, that are observed only intermittently at follow-up visits, 
motivating the need to model the survival and longitudinal covariates jointly. 
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Table 1 

Simulation results under five settings with sample size 200 and varying values of 022 
and a 2 . The actual targets of the longitudinal estimates are conditional quantities marked 

as fil and /xj etc. and are listed next to the true longitudinal value in the first column. 
The mean and SD of the estimates based on 100 Monte Carlo samples are reported in the 

second and third column 



Average of 



Parameter 




orillVLIU ) 


A/f CI? 


Convergence rate 


p (i) 


0.9923 


0.1633 


0.0267 


98% 


a 2 (0.1) 


0.0998 


0.0021 


5e-6 




m/nl (2/1.73) 


1.7461 


0.0478 


0.0668 




M2//12 (0.50/0.50) 


0.4545 


0.0985 


0.0118 




au/au (0.50/0.45) 


0.4634 


0.0527 


0.0041 




^12/012 (-0.001/-0.001) 


-0.0424 


0.0453 


0.0038 




0-22/0-22 (0.01/0.01) 


0.0738 


0.0409 


0.0057 




p (1) 


0.9185 


0.1765 


0.0378 


72% 


a 2 (0.4) 


0.4003 


0.0086 


7e-5 




m/ti (2/1.74) 


1.7455 


0.0531 


0.0676 




m/nl (0.50/0.50) 


0.3801 


0.1640 


0.413 




ffii/tr?! (0.5/0.45) 


0.4730 


0.0505 


0.0033 




o-i2/o-{ 2 (-0.001/-0.001) 


-0.1122 


0.0917 


0.0208 




0-22/0-h (0.01/0.01) 


0.1856 


0.1156 


0.0442 




P (!) 


1.0380 


0.1548 


0.0254 


96% 


a 2 (0.025) 


0.0250 


4.8283e-4 


~0 




m/iA (2/1.73) 


1.7443 


0.0468 


0.0676 




IM»/H2 (0.50/0.50) 


0.4900 


0.0643 


0.0042 




crii/o-n (0.50/0.45) 


0.4520 


0.0534 


0.0052 




0-12/0-I2 (-0.0004/-0.0004) 


-0.0193 


0.0338 


0.0015 




0-22/0-^ (0.01/0.01) 


0.0571 


0.0219 


0.0027 




P (1) 


0.9684 


0.1504 


0.0236 


98% 


a 2 (0.1) 


0.0997 


0.0023 


5e-6 




m/txt (2/1.74) 


1.7464 


0.0460 


0.0664 




M2//12 (0.50/0.50) 


0.4491 


0.0948 


0.0116 




dnKi (0.5/0.45) 


0.4497 


0.0423 


0.0043 




a-ia/ffia (-0.001/-0.0007) 


-0.0439 


0.0518 


0.0045 




0-22/0-22 (0.0025/0.0025) 


0.0797 


0.0479 


0.0072 




P (1) 


0.9934 


0.1567 


0.0246 


95% 


a 2 (0.1) 


0.0996 


0.0020 


4e-6 




m/ri (2/1.74) 


1.7522 


0.0464 


0.0636 




M2/M2 (0.50/0.50) 


0.4498 


0.1168 


0.0162 




an/an (0.50/0.45) 


0.4559 


0.0442 


0.0039 




0-12/0-I2 (-0.001/-0.002) 


-0.0433 


0.0692 


0.0066 




0-22/0*22 (0.04/0.04) 


0.1186 


0.0642 


0.0159 
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Table 2 



Performance 


of estimated variance, 


SE(BT), 


of f3 and a 2 through bootstrap 


with 50 






resamples 




Case 


Parameter 




SE(MC) 


SE(BT) 


1 


P (1) 




0.1633 


0.1692 




a 2 (0.1) 




0.0021 


0.0020 


2 


P (1) 




0.1765 


0.1813 




a 2 (0.4) 




0.0086 


0.0091 


3 


P (1) 




0.1548 


0.1523 




a 2 (0.025) 




4.8283e-4 


5e-4 


4 


PW 




0.1504 


0.1539 




a 2 (0.1) 




0.0023 


0.0020 


5 


(1) 




0.1567 


0.1531 




a 2 (0.1) 




0.0020 


0.0021 



The main biomedical interest lies in determining the effect of the HAART 
treatment on reducing the risk of developing AIDS, and the association 
between the incubation period of AIDS and CD4 T-cell counts in HIV- 
infected subjects. 

For each of the 448 subjects in the study, the longitudinal measurements of 
CD4 T-cell counts were recorded intermittently along with the time to AIDS 
or dropout from the study. The total number of longitudinal measurements 
is 4442 and the average number of longitudinal measurements is 9.92 per 
patient. 

One feature of this data is that the incubation period is subject to left- 
truncation and right-censoring, since patients were recruited to the study at 
various times after the study began, and only patients who have not devel- 
oped AIDS at the time of recruitment are included in the study. Moreover, 
only 147 out of the 448 patients (about 33%) developed AIDS by the end of 
the study, so the right censoring rate is quite high for this data. 

To model the longitudinal CD4 counts, we adopt a linear mixed effects 
model on log(CD4 + 1) with changing intercepts and slopes at the time of 
HAART treatment. Thus, 

W*(s lj )=X l (s ij ) + e ij = A* + A* 1 s lj + A* 2 I(s ij > V l )+A* 3 s lj I(s ij > VJJ+ey, 

where is from a normal distribution N(0,a 2 ), A* = (A* , A* ± , A* 2 , A* 3 ) is 
from a 4-dimensional multivariate normal distribution with a 4 x 1 mean 
vector fj, and a 4 x 4 covariance matrix E, and Vi represents relative age 
since HIV-positive of receiving HAART. For those who have never received 
HAART, Vi is defined to be infinity. For the time-to- AIDS, we assume a Cox 
model with Xi(t), CD4 counts, as an time-dependent covariate along with 
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another time-dependent treatment indicator, lit > Vi), which is completely 
observed. The resulting model is 

X(t\A*) = Ao(t) exp(A^(t) + fol{t > Vi)). 

From the EM algorithm with Monte Carlo approximation, the slope, fix, 
for the underlying log(CD4 + 1) process is estimated to be —0.5762 (p- value 
< 0.001), while the slope, fa, for the longitudinal treatment indicator is esti- 
mated to be —1.2189 (p-value < 0.001). As expected, CD4 counts are nega- 
tively associated with the risk of AIDS. One unit of decline on log(CD4 + 1) 
is associated with an increasing risk of AIDS by 78%. In addition to its ef- 
fect on CD4 counts, HAART has an additional effect on reducing the risk of 
AIDS. It significantly reduces the risk of developing AIDS by 70% after con- 
trolling for the CD4 counts. Through the analysis, we confirm that HAART 
effectively reduces the risk of developing AIDS both through a positive as- 
sociation with patients' CD4 counts and the risk to develop AIDS. 

6. Conclusions and discussion. We have shown, both theoretically and 
empirically, that joint modeling the time-to-event and longitudinal covari- 
ates is an effective modeling approach when the time-to-event is subject to 
both left truncation and right censoring. However, the extension from right- 
censorship to LTRC is not trivial. By modifying the joint likelihood, we have 
shown that NPMMLE leads to consistent and asymptotically efficient esti- 
mation of the survival component and measurement error variance under 
the setting of a semiparametric Cox model. We have also demonstrated that 
the corresponding EM algorithm to locate the NPMMLE has good empiri- 
cal performance and asymptotic properties under the assumption of normal 
random effects. It is not only computational effective but also robust against 
departures from the normality assumption. 

However, one caveat is the estimability of the longitudinal component. 
Although we can recover the conditional distribution of the longitudinal 
parameter, a, given Y > T, the parameter a itself can not be estimated 
properly though the modified likelihood due to the biased sampling plan. 
Additional strong and possibly unverifiable assumptions might be needed 
in order to recover the parameter a of the random effects. What we have 
accomplished in this paper is to successfully remove the bias for the esti- 
mation of the survival components attributed to the discrete measurement 
schedule and measurement errors of the longitudinal covariates, thus per- 
mitting asymptotically valid and efficient inference for the survival related 
parameters, which are crucial for the evaluation of therapies. 

APPENDIX 

A.l. Likelihood and the score equations. By imposing a normality as- 
sumption N(fi, S) on the random effects A*, the full likelihood in (2.3) from 
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the ith subject becomes 



/OO n i 
expl 8iPg(zi)ai - - g(sij)ai} 2 /(2a 2 ) >Qi(zi, a;) da. 



oo /-oo 



3=1 

Qi(t,ai)fT(t)daidt, 

i 

where Qi(u, a) = exp{— J " exp[/3g(t)a] dAo(t) — (a — /x) T S _1 (a — /x)/2}. Fol- 
lowing similar arguments as in Wang (1987) and combining with Vardi 
(1985), we can prove that the NPMLE's of all finite-dimensional param- 
eters are the same as those from the conditional likelihood of (zi,8i,Wi) 
given (Y* > ti). Moreover, by a proof similar to that of the classical Cox 
model for right censored data, the NPMLE from the conditional likelihood 
is attained by discrete baseline hazard functions that assign positive masses 
only at uncensored survival times, (y®, . . . ,Un u )- 

Let Oi = (ti, Zi,5i,Wi) denote the observed data for the ith subject. The 
first derivative of the log full likelihood leads to the following score functions: 



n ( ni 



S„2 



Y.) J2 E l w H - A l9{s l3 )\oi\ 2 - n l( j 2 \ I cj- 
i=l lj=l J 



S ° = S- 1 J2E{(A* - - [E(A*\Y* > T*) - M ]| 0i } 
i=i 

n 

= S- 1 J2[E(A*\ 0i ) - E(A*\Y* > Tl)], 
i=i 

n 

s °* = 2 s " 1 E^ A * - AW - ^) T N 

i=l 

- E[{A* - fj,)(A* - v) T \Y* > T*}}^ 1 , 
s °^ = ir- £ E{exp[Pg(y k )A*}\a t }-Q 2 (y k ), 



r.U<yl<Zi 



i=i 



£ Yl A J^Mf °K exp[^ fl (yj)^]| 0i } - Q 3 



i=1 j-k<y°<zi 
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where 



Q2{y) 



]T E{eMPg(y)A*]\oi} 



i:y<U 



- nE{eM/3g(y)A*)I(y < Ti)\Y* > T*}, 



Q3 




The score equations, s° and s^, corresponding to the longitudinal data 
reveal that the estimable terms are the conditional mean and covariance 
matrix of the random effects given that Y* > T rather than \x and S. 

The score functions for k = 1, . . . ,n u , and (3 have more complicated 
forms than those from a partial likelihood under standard Cox model subject 
to LTRC. The complication is due to the additional terms Q2 and Q3, which 
require estimation of the expectations of nonlinear functions of the observed 
data along with the the parameters of interest. If we drop these two terms 
from s^ k and s^, the modified score functions, s\ k = s^ k +Q2(?/fc) and sp = 
s °fi + ^3) are exactly the score functions from the modified likelihood. The 
next Lemma validates the use of the modified likelihood (2.5). 

Lemma A. 1 . (i) E 9o (s Ak ) = E 0O (s° Ak ) and E 6o ( S/3 ) = E 0o (s° p ) . This pro- 
vides Fisher consistency of the estimators (2.5). 

(ii) Under the regularity conditions for law of large numbers and Slutsky 
theorem, n~ 1 (s\ k — s^ k ) = o p (l) and n~ l {sp — s°a) = o p (l). 

Proof. The proof follows from simple derivation and applications of 
the law of large numbers along with Slutsky's theorem. □ 

This lemma demonstrates the asymptotic equivalence of the score func- 
tions for the survival-related parameters from (2.3) and (2.5). The latter is 
computationally simpler to maximize and thus more attractive than the full 
likelihood. 

A.2. Proof of the consistency of the NPMMLE. The proof of consistency 
includes four major steps and is elaborated below. 

Step 1. Existence of the NPMMLE of (7, A). We will begin the proof 
that the candidates for the maximizer, A nu , have a finite and bounded jump 
at each observed survival time. For simplicity, we use a vector form \ nu = 
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(Ai, . . . , A„ u ) to express the jump sizes of A nu at ordered survival times. The 
boundedness of the jump sizes can be demonstrated by proving the existence 
of an upper bound B 6 K through apagoge. Suppose that for any arbitrary 
B £R, there exists X nu>B = (Ai, B , . . . , A„„,b) € M n "\[0,B] n " and 7B G S 7 
such that L m ( lB Xu,B) > L m (j,X nu ) for all (7, A n J G 5 7 x [0,5]™". The 
first part in L m (7#, A„ u B ) contributed by the ith subject is bounded above 
by 

(A nu)B {zi}M) 5 > x expj-m ^ Aj, B J, 

where m, M is defined in assumption C2. Since A nuiB G M nu \[0, f?] n " , at least 
one jump size, say Aj b, is greater than f?. It induces that <v°<z B > 

B, and then implies that L m (7 B , A„ Ui b) — > as B — > 00. Thus L m (7, A„ u ) = 
0, for all (7, A n „) £S 7 x R n ", which is a contradiction. It demonstrates the 
boundedness of the jump sizes of A TCu . Along with the compactness of 5 7 
provided by assumption CI, we accomplished the existence of the NPMMLE 
of (7, A). ' 

Step 2. Almost surely boundedness of A(r 2 ) as n — >■ 00. For any fixed 
sample size n, the estimated cumulative hazard function evaluated at the 
endpoint of the study can be expressed as 



A(r 2 ) 

(A.l) 



hl(z k < T 2 ) 



E7=l E § [e W {Pg(z k )A*}\oi]I{ti < z k < Zi ) 
< y> S k I{z k < r 2 ) < ELi S kl(zk < r 2) 



- J l m Yl r i=i I ( t i <Zk< Zi) rnY% =1 I(U < ti)/(t 2 < Zi)' 



where m is the lower bound of exp{/3g(Y)A*}\Y* > T* , which exists under 
assumption C2. By the law of large numbers and the continuous mapping 
theorem, we have the following two limits as n — > 00: 



1 n 

- Y 5 k I{z k < r 2 ) -> E(AI(Y < r 2 )) < 1 and 



fc=l 

(A.2) 

1 1 

(l/")£r=iJ(*i<n andr 2 <Zi) ^ P(T < n and Y > r 2 ) < °°' 

where the finiteness of the second limit is following assumption C3. There- 
fore, there exists an upper bound of A(r 2 ) even when n goes to infinity. 
Moreover, since the terms inside the summation in (A.l) are all strictly 
positive, A(r 2 ) is always greater than 0. Thus A(r 2 ) has been shown to be 
bounded almost surely as n — > 00. 
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Step 3. Uniform convergence of (cr^,/3 n ,A n ) to (<Tq, /?o, Ao). We have 
shown in Step 2 that A(t2) is finite, combining with the fact that A is 
a right-continuous and nondecreasing step function along with the Helly 
selection theorem, there exists a subsequence of A converging pointwisely 
to a right continuous and monotone function A* with probability 1. More- 
over, by the Balzonno-Weierstrass theorem, there is a sub-subsequence of 7 
which converges to some 7** Therefore, there exists a sub-subsequence of n , 
denoted by ^( n ), that converges to 6* = (7*, A*). We next show that 6* = 
(aQ,o-Q, /3o,Ao), where Qq is the limit of a. Here a new term, defined as 

t fr\ = Iy"* Sjjjzk < t) 

nU n ^ (l/n)J^ =1 E eo [eMPo9(z k )A*}\ 0l }I(U <*k< 

is introduced to serve as a bridge between A n and Ao- 

We first show the convergence of A n to Ao as follows. We will use a prop- 
erty that the class of all functions from a closed set to M, which are uni- 
formly bounded and of bounded variation, is Glivenko-Cantelli. Consider 
the denominator of A n . The assumptions imply that functions of the form 
u — > Eq q [exp{(3og(u)A*}I(T <u< Y)\o], where o denotes the observed data 
of a subject, are uniformly bounded and of bounded variation, so the class 
of these functions is Glivenko-Cantelli. Therefore, 



1 n 

-S^Eg^expiPog^A^lo^Iiti <u< Zi ) 

(A.3) 

-+ E„„{exp[A, 9 («)A*]/(r < u <Y)\V> T'\ 

uniformly on [ti,T2]. Along with assumption C4, the uniform convergence of 
the inverse of the right-hand side to the inverse of the left-hand side in (A.3) 
holds. Moreover, uniform boundedness and bounded variation of functions 
t — > A/(y < t) imply the Glivenko-Cantelli property of the class consisting 
of them. Thus we also have 

1 " 

(A.4) -J2^iI(Yi<t)^E 9o [AI(Y<t)} 

i=l 

uniformly on [ti,T2]. Since 
A (t) = E 



AI(Y<t) 



E{exp[p g{Y)A*]I(T < u < Y)\Y* > T*}\u = Y 

combining the convergence of the inverse of both sides in (A.3) and (A.4), we 
obtain A n converges uniformly to Ao on [t\ , T2] . By considering the uniform 
convergence of the ratio of A{u}/A{u} to dA* (u)/dAo (u) for u S [ti,T2], as 
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demonstrated on pages 2146-2147 in Zeng and Cai (2005), the uniform con- 
vergence of A to A* is established. The remaining task is to prove the equiva- 
lence of 6* = (7*, A*) and 9q = (/3o, cr°, a*, A ). This can be done by consider- 
ing the empirical mean of the distance between /™(# n ) and if 1 (A), a°,a* ,A n ) 
and demonstrating that E#* [l m (6*) / l m (6^)] = almost surely as shown on 

page 910 in Dupuy, Grama and Mesbah (2006). Thus (<r 2 ,/3,Ao) converges 
uniformly to (erg , /3 , A ) . 

A.3. Proof of asymptotic normality of the NPMMLE. We will apply 
Theorem 3.3.1 in van der Vaart and Wellner (1996) to prove the asymptotic 
normality of the NPMMLE (7, A). The proof consists of four steps to verify 
each of the four conditions in their theorem. 

Step 1. Frechet differentiability of the score functions. For notation sim- 
plification, the parameter a 2 will be combined with a into 71 = (a 2 , a) so 
that the single parameter 71 denotes the parameter of the measurement er- 
ror e and the latent random variable A*. Thus, the new parameter vector is 
= (7i,& A). 

Consider a one-dimensional submodel along the direction (hi, h 2 , h 3 ) of 
the form 



(ji + thi,P + th 2 ,A t (h 3 )), 



where 



Mhs)(- 



(l + th 3 (u))dA(u), 



hi £ R d , h 2 <ER and h 3 is a bounded- variation function on [0, r 2 ]. Let H = 
{h = (hi, h 2 , h 3 )} and H p = {h G H : ||/ii||, l^li ||^3||« — p}- The notation 
|| • ||„ denotes the absolute value evaluated at plus the total variation of the 
argument. The imputed log-likelihood contributed by the ith subject eval- 
uated at 8, given the current value of parameter denoted as 6, is denoted 
by l§ i(0). The corresponding score function of the local parameter t is 

^- t l § .(e t ) = h 2 E § \s i g(z i )A* 



g(u)A*exp{(P + th 2 )g(u)A*}(l + th 3 (u))dA(u) 



+ hiE; 



h 3 (u) exp{(P + th 2 )g(u)A*}dA(u) 



Oi 



+ 



o~ih 3 (zj) 
l + th 3 ( Zi )' 
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Thus the imputed score function of t contributed by the n subjects evaluated 
at t = is 

n 



(A.5) 
where 



i=l 



t=0 



= ^S nAl (e) + h 2 S n§2 (8) + s nA3 (e)(h 3 ) 

i n r d 



i=i 

n 



I? 



j=i 



t=i 



^(u)i4?exp{^(u)i4J}dAu 



/i 3 (it) exp{/3#(u)>l*} dA(u) 



o, 



By defining 6>(/i) = (71 , /3, A) (hi , /i 2 , /13) = 71 + ^2/9 + f^ 2 h 3 (u) dA(u) , where 
h £ .ffp, the parameter can be regarded as a functional on the parame- 
ter space Q = {9} is a subspace of L°°(H P ) and the score in (A.5) is a random 
map from to a Banach space which contains functions (operations) of h. 

Besides the above imputed score, we also need the mean imputed score 
function of t under the true value #0 an d denote it as 



S § (6)(h)=Ee 
where 

S eA d ) = E 8o{ E 



t=o. 



^S n§A (8) + h 2 S §2 (e) + S §3 (8)(h 3 ), 



SfjJQ) - Eg Q < Eg 



d 



log/ e ,A*(£i,A*|7i) 

Y, 
T, 



g(u)A*exp{/3g(u)A*}dA(u) 



S § Je)(h 3 ) = E d0 \A l h 3 (Yi)-E § 



Y, 



h 3 (u) exp{f3g(u)A*} dA(u) 



T, 



O; 



To prove the Frechet differentiability of the map, 9 — > Sg*(9) at 6q, where 
9q = (7^ ,/3o,Ao) with 7^ = (cr^ct*), we need to calculate the correspond- 
ing derivative. First, we introduce a notation VqS^Oq) = §iSq(9q +t9)\t=o, 
where 0g + tO = (a + ta, (3 + 1/3, A (-) + tA(-)). Then 

V e S § (9* )(h) 



^s § (e* +t9)(h) 



t=o 
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0(7i*o + *7i) 



log/ £; A*(£i,4*l7*o + *7l) 



(A.6) 



+ A i /» 3 (y i ) 



%g(u)A* exp{p + t/3g(u)A*}(dA (u) +tdA(u)) 



0} 



E; 



h 3 (u) 



T, 



x exp{(/3 + t(3)g(u)A*}(dA (u) +tdA(u)) 
Using the chain rule, equation (A.6) can be simplified as 



where 



Oi 



t=0 



"7i c^iO) -f3v§2( h ) ~ / ffffaWM^M' 



(A.7) ^ 1 (/ l ) = -E e5 |^E 
o- - 2 (/ 1 )=E Je - 

(A.8) 



() 2 



T 



log/ £ ,A*(£i,4*l7io) 



[h 2 g(u)A* + 7i 3 («)M«)4* exp{/3 5(u) A*} 

x I(Ti <u< Yi)dA (u) 

°§3( h )( u ) = E e^ E eii h ^9(u)A* + h 3 (u)] exp{j3 g(u)A*} 
(A.9) ' 

x I(Ti <u<Y^\ 0i }}. 
Evaluating (A.6) at the true value 9q leads to 

(A.10) V e S e *(9* )(h) = -y[<Tos,i(h)-p*e$M - f* a e ^{h){u)dA{u), 

Jo 

where each of the cr-function has similar forms as the corresponding function 
listed in (A.7), (A.8) or (A.9) with the double expectation E 6) »{Eg{[-|o i ]} 
replaced by Eg*{-}. Now apply the Taylor expansion of exp{(/3o + tf3)g(u)A*} 
at t = 0, to get 

S e * (6* + tO) - S e * (9* ) - V e S e * (05) = o(t), 
where the small-o function does not depend on 9. Therefore, 

\\S e *(9* + t9) - S e *(9* ) - V e S e *(9* )\\ P n t n 
- - > as t — > 
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uniformly in 8 = (71,/?, A). Thus the Frechet derivative of the mapping 
8 —7- Sg*(8) evaluated at 8q takes the form (A. 10). We will use the nota- 
tion S0*(8q)(8) to denote it. 

Step 2. Continuous invertibility of Sg* (8q)(8). The continuous invertibility 
of the Frechet derivative can be established by showing that there exists some 
number c > such that 



(A.ll) 



. „ \\ s e*(0o)\\i^(H) 

mf -f-r. > c. 

0Slin0 J I P 1 1 joo /m 



Since S0*(8q)(8) can be expressed as a linear combination of the three a- 
operators according to (A. 6), it is necessary to check the continuous in- 
vertibility of those o"-operators. The proof is similar to the arguments in 
the Appendix of Zeng and Cai (2005). Through the continuous invertibility 
of o~e* , the lower bound c can be found as where q satisfies ag~}(H q ) C H p . 
Details to find the lower bound are analogous to the approach in Dupuy, 
Grama and Mesbah (2006) (page 915). Thus the derivative Sq*(8q) is con- 
tinuously invertible. 

Step 3. Convergence in distribution to a tight element. In this step, the 
convergence of y/n(S n g — Sq*)(8q) in distribution will be demonstrated. 
Since S0*(8q) is the mean of the score function evaluated at the true value 
of 8, it is equal to zero. Then 

1 n 

- S6*M)(h) = -Y^i D ^ h ) + D ^ + s Myi) + Di, 3 (h)}, 



i=l 



where 



D ltl {h) = hjE^ 
Di, 2 {h) = h 2 E § 
Di,s(h) = -E, 



^log/ £ ^( £j ,A*l7io) 



8ig(zi)A* 



g{u)A* exp{f3 g(u)A*} dA u 



h 3 (u) exp{f3 g(u)A*} dA (u) 



O; 



The class + B)i^ 2 ){h) : \\hi\\ + \h 2 \ < p} is bounded Donsker, since 

it is a finite dimensional class of measurable score functions. Moreover, since 
any class of real- valued functions on [0, t 2 ] that are uniformly bounded and 
bounded in variation is Donsker, the class {Sh^(y) 1/13 6 BV P } is Donsker. 
The Donsker property of the class Di$(h) : /13 € BV P } also follows from 
this fact. We have thus shown that the class {[S ng ~ — S6i*](#q)(/i) : ||/ii|| + 
\h 2 \ < p,ti3 £ BV P } is Donsker, since the sum of bounded Donsker classes is 
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also Donsker. This implies 

a tight Gaussian process in loo(H p ). 

Step 4. Verification of conditions 1 and 4. Condition 4 holds by the con- 
sistency of the estimator 8 n . Condition 1 can be verified by considering the 
Donsker property of the class {S. t g(9)(h) — 5.^* (#q)(/i) : \\8 — 9q\\ p < v,h £ 
Hp} for some v > 0, where S.^(6)(h) is the general form of Si^{6)[h) = 
^lh,i(dt)\t=o- We omit the details since they are similar to those for the case 
of right-censored data, considered in Zeng and Cai (2005). 

We have verified the four conditions needed for the asymptotic distribu- 
tion of the NPMMLE § n , and therefore 

as n — > oo. 

Using the form of the Frechet derivative in (A. 6), one finds that there 
exists a linear operator a = (ce^i, a 6^,2i °"0o,3) that maps H p to W d+l x BV P , 
such that 

& -(Wi " W) = "(Til - 7i 2 ) T ^ mW - (Pi - P2)<rp M 

ag* t3 (h)(u)d(Ai-A 2 )(u). 

The continuous invertibility of the a operator has been shown already, so 
its inverse operator, denoted by a" 1 , exists. Since 

VfiSg. (^)(7i - 7i*o, P ~ A), A - A )(h) = V^{S n ,e* (h) ~ Se* W)(h)} + o p (l), 
by applying the inverse operator a -1 on both sides we obtain that 

v^{-(7i - 7io) T ^i -0- Po)h 2 - f h 3 (u) d(A - A )(u) 

(A.12) 1 _ . J ° 

= V^{Sn,8z(h)-S e *(9* )(h)} + Op (l), 

where h = {hi,h2-,h 3 ) = a~ l {h). If h\ and /13 in (A.12) are chosen to be 0, 
then this reduces to 

v^{-(/3-A))M 

= yfr{S n ,p {h)-S e .W)(h)} + o p (l) ) 

where the latter term is in the form of linear combinations of score func- 
tions for the parameters. Since score functions derived from the modified 
likelihood is asymptotically equivalent to those from the full likelihood by 
Lemma A.l, the influence function is the same as the efficient influence func- 
tion for /?o^2 by its uniqueness in the linear span of the scores. Thus the 
estimator /3 is efficient for /3q. 
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